## Chapter 12. Semiconductor Memory

## Chapter Overview

Memory Classification
 Memory Architectures
 The Memory Core
 Periphery
 Reliability

## Introduction

Memory is used for storage of data values and program instructions. E.g. cache memories, RAM/ROM, USB flash drive, etc.
 Dense data-storage circuitry is one of the primary

concerns of a digital circuit or system designer  $\Box$  Using register cells (e.g. DFF) for memory  $\rightarrow$ 

excessively large area, not feasible

Array structure is used for memory to increase storage density and reduce overhead caused by peripheral circuitry

□ Memory design: robustness, performance, power

### Semiconductor Memory Classification

- □ ROM: read-only memory, Data cannot be modified once written
- ✓ Data is encoded into circuit topology
- ✓ Nonvolatile, stored data is not lost when supply voltage is off
- RWM (read-write memory): data is stored in flip-flops (static) or as charge on capacitor (dynamic, needs refreshing),
- ✓ Volatile (data is lost when supply voltage is off).
- ✓ FIFO (first-in first-out), LIFO(last-in first-out), shift register, CAM (contents-addressable memory)
- □ NVRWM: nonvolatile read-write memory
- ✓ EPROM: erasable programmable read-only memory
- E2PROM: electrically programmable read-only memory
  flash memory

| RWM                              |                                                               | NVRWM                                 | ROM                                    |
|----------------------------------|---------------------------------------------------------------|---------------------------------------|----------------------------------------|
| Random<br>Access<br>SRAM<br>DRAM | Non-Random<br>Access<br>FIFO<br>LIFO<br>Shift Register<br>CAM | EPROM<br>E <sup>2</sup> PROM<br>FLASH | Mask-Programmed<br>Programmable (PROM) |

### Memory Architecture: Decoders

□ Intuitive architecture for N×M memory: only one select signal S<sub>i</sub> can be 1 at any time. If N=2<sup>20</sup>, it needs 2<sup>20</sup>=1048576 select bits.
 □ Inserting address decoder reduces number of address bits to K=log<sub>2</sub>N. For N=2<sup>20</sup>, it only needs 20 address bits.



### Array-Structured Memory Architecture

□ Word line: horizontal select line that enables a single row of cells. □ Bit line: wire connecting cells in a column to input/output circuitry □ For smaller area, we reduce number of transistors in each cell □ Voltage swing on bit lines is reduced to substantially below  $V_{dd}$  to reduce delay and power → needs sense amplifier to recover internal swing to full rail-to-rail amplitude.



### Array-Structured Memory Architecture

□ Previous design has extremely large aspect ratio.
 □ Solution: arrange memory array so that vertical and horizontal dimensions are almost equal → aspect ratio≈1
 ✓ multiple words stored in a single row and selected simultaneously

- ✓ column decoder to route correct word to input/output terminals
- $\checkmark$  column address (A<sub>0</sub>~A<sub>k-1</sub>) and row address (A<sub>k</sub>~A<sub>k-1</sub>)

Problem: ASPECT RATIO or HEIGHT >> WIDTH



### Hierarchical Memory Architecture

- □ Large memories suffer from speed degradation due to very long word and bit lines.
- □ Solution: partition memory to P smaller blocks
- ✓ Extra block address select 1 of P blocks to be read/written.



Advantages: 1. Shorter wires within blocks 2. Block address activates only 1 block => power savings

### Memory Timing: Definitions

□ Read-access time: delay between read request and the moment data is available at output.

□ Write-access time: time elapsed between a write request and the final writing of input data into memory.

□ (Read or write) cycle time: minimum time required to between successive reads or writes.



# Memory Core Design: ROM Memories

□ ROM cell design: 0 or 1 should be presented to bit line upon activation of its word line.

 $\square$  MOS ROM cell design: bit line is resistively clamped to Vdd  $\rightarrow$  default output=1.

□ Absence of a transistor between WL and BL: WL=1, BL=1 (default value)  $\rightarrow$  "1" is stored.

□ Providing a MOS transistor between WL and BL: WL=1, transistor is ON, BL is shorted to GND  $\rightarrow$  "0" is stored.



### Memory Timing: Approaches

DRAM timing: multiplexed addressing. Lower and upper halves of address words are presented sequentially on address bus.
 ✓ RAS (row-access strobe): MSB part of address is present
 ✓ CAS (column-access strobe): LSB part of address is present

□ SRAM: self-timed approach



### MOS NOR ROM

□ Example 4×4 MOS ROM array

✓ PMOS load is used to pull up bit lines in case none of attached NMOS devices is enabled.

✓ GND lines are shared between 2 consecutive WL lines.



### MOS NOR ROM

□ MOS NOR ROM: combination of a bit line, PMOS pull-up and NMOS pull-downs constitutes pseudo-NMOS NOR gate with word lines as inputs  $\rightarrow$  MOS NOR ROM.

□ N×M ROM: a combination of M NOR gates with at most N inputs (for a fully populated column).

✓ Only 1 word line goes "1"  $\rightarrow$  at most 1 of pull-down devices is ON.

✓ Resistance of pull-up device must be larger than pulldown resistance to ensure adequate low level of logic "0"



*MOS NOR ROM Layout* □ Layout of 4×4 NOR ROM (contact-mask programming)

- ✓ Constructed by repeating same cell in both X, Y directions
- ✓ Mirroring odd cells around horizontal axis to share GND.

✓ Memory is programmed by selective addition of metal-todiffusion contacts

✓ Presence of metal contact to BL: 0-cell: absence: 1-cell.



Only 1 layer (contact mask) is used to program memory array Programming of the memory can be delayed to one of last process steps

## MOS NOR ROM Layout

Layout of 4×4 NOR ROM (threshold-raising programming) ✓ Thresholds of some transistors are selectively raised to be higher than  $V_{dd}$  (e.g. to 7V for  $V_{dd}$ =5V) by implanting extra ptype impurities

 $\checkmark$  Transistors with higher V<sub>th</sub> can never be turned ON  $\rightarrow$ equivalent to be eliminated



Threshold raising implants disable transistors

### MOS NAND ROM

□ MOS NAND ROM (word lines operated in reverse-logic)

✓ All word lines are 1 by default, only for selected row WL=0

✓ All transistors on nonselected rows are turned ON

✓ If no transistor present on intersection, all other transistors are turned ON since WL=1 $\rightarrow$  output of BL=0.

✓ If a transistor present at intersection, it's turned off due to selected WL=0  $\rightarrow$  output of BL=1



All word lines high by default with exception of selected row

MOS NAND ROM Layout Advantage of MOS NAND ROM: no contact to Vdd or Gnd

- is needed  $\rightarrow$  cell size is reduced substantially.
- Eliminating a transistor means replacing it with short-circuit
- $\rightarrow$  threshold-lowering implant using n-type impurities
- (depletion transistor, always on regardless WL value)



Loss in performance compared to NOR ROM

### ROM Transient Performance

□ Transient response of a memory array: the time it takes from the time a word line switches until the point where the bit line has traversed a certain voltage swing  $\Delta V$ . □ Since bit line feeds into sense amplifier, it is not necessary to traverse full voltage swing  $\rightarrow$  Generally  $\Delta V=0.5V$ . □ Most of the delay of a memory array is attributable to interconnect parasitic resistance/capacitance.



Resistance/cell: (8.5/4) x 0.07  $\Omega/q = 0.15 \Omega$  (which is negligible)

Wire capacitance/cell:  $(8.5 \lambda \times 4 \lambda) (0.6)^2 0.031 + 2 \times (8.5 \lambda \times 0.6) \times 0.044 = 0.83$  fF

Drain capacitance/cell:  $((3\lambda \times 4\lambda) (0.6)^2 \times 0.3 + 2 \times 3\lambda \times 0.6 \times 0.8) \times 0.375 +$  $4\lambda \times 0.6 \times 0.43 = 2.6 \text{ fF}$ 



### **Propagation Delay of NOR ROM**

### Word line delay

Consider the  $512 \times 512$  case. The delay of the distributed *rc*-line containing M cells can be approximated using the expressions derived in Chapter 8.

 $t_{word} = 0.38 (r_{word} \times c_{word}) M^2 = 0.38 (35 \ \Omega \times (0.65 + 5.1) \text{ fF}) 512^2 = 20 \text{ nsec}$ 

### Bit line delay

Assume a (2.4/1.2) pull-down device and a (8/1.2) pull-up transistor. The bit line switches between 5 V and 2.5 V.

 $C_{hit} = 512 \times (2.6 + 0.8) \text{ fF} = 1.7 \text{ pF}$ 

 $I_{avHL} = 1/2 (2.4/0.9) (19.6 \ 10^{-6})((4.25)^2/2 + (4.25 \times 3.75 - (3.75)^2/2)) -$ 

1/2 (8/0.9) (5.3 10<sup>-6</sup>) (4.25 × 1.25 - (1.25)<sup>2</sup>/2) = 0.36 mA

 $t_{HL} = (1.7 \text{ pF} \times 1.25 \text{ V}) / 0.36 \text{ mA} = 5.9 \text{ nsec}$ 

The low-to-high response time can be computed using a similar approach.

 $t_{LH} = (1.7 \text{ pF} \times 1.25 \text{ V}) / 0.36 \text{ mA} = 5.9 \text{ nsec}$ 

# □ Word-line delay is larger than bit-line delay due to large

resistance of poly-Si wire

- Methods to reduce word-line delay
- ✓ Partition word-line into multiple sessions and insert buffers
- $\checkmark$  Drive word-line from both ends (reduces delay by factor of 4)



# □ Disadvantages of previous NAND/NOR ROM

 $\checkmark$  ratioed logic: V<sub>OI</sub> depends on ratio of pull-up/pull-down devices

✓ static power consumption: a static current path exists between V<sub>dd</sub> and G<sub>nd</sub> when output is low

□ Solution: Use precharged logic (dynamic logic)



### ROM Memories – A User Perspective

Application-specific ROMs: memory is part of a custom design and programmed for that particular application only. Commodity ROM chips: vendor mass-produces memory and customized according to customer specifications.

- ✓ mask-programmable using contact or extra implant mask.
- $\checkmark$  programming involves manufacturer  $\rightarrow$  undesirable delay

□ PROM (Programmable ROM): allow customer to program the memory one time (write once) by fuses.

- $\checkmark$  a single error in programming makes the device unusable
- □ NVRW (Nonvolatile Read-Write Memory):
- ✓ Memory programming: selectively disabling/enabling memory cell by electrically altering transistor threshold
- ✓ Modified threshold is retained indefinitely even power is off
- ✓ Programmed value can be erased and reprogrammed.

### Nonvolatile RWM: Floating-gate Transistor (FAMOS)

 □ FAMOS: Floating-gate avalanche-injection MOS
 ✓ extra poly-Si strip inserted between gate and channel: floating gate.



### **EPROM**

□ EPROM: Erasable-Programmable Read-Only Memory ✓ erased by shining UV light on cells through a transparent window in the package.

 $\checkmark$  UV light generate electron-hole pairs in oxide to make it slightly conductive.

✓ extremely simple and dense, good to fabricate large memories at low cost

Disadvantages

✓ "off-system" erasure: memory must be removed from board and placed in EPROM programmer for programming

- ✓ slow erasure: 5-10µsec/word
- ✓ limited endurance: maximum 1000 erase/program cycles
- ✓ reliability issue: device threshold changes after many cycles

## Floating-Gate Transistor Programming

□ Applying high voltage (15~20V) between S and G-D  $\rightarrow$  high E-field  $\rightarrow$  avalanche injection of electrons through oxide and get trapped on floating gate

□ To turn on the device, a higher voltage is needed to overcome the effect of induced negative charge

 $\square$  Threshold voltage is increased (~7V). A 5V voltage is not sufficient to turn on the transistor  $\rightarrow$  transistor disabled.



### FLOTOX EEPROM

EEPROM (E<sup>2</sup>PROM): Electrically-Erasable Programmable Read-Only Memory

✓ use FLOTOX (floating-gate tunneling oxide) transistor

✓ a portion of dielectric separating the floating gate from the channel and drain is reduced in thickness to about 10nm

 $\checkmark$  Voltage of 10V (E=10<sup>9</sup>V/m) applied over thin insulator  $\rightarrow$  electrons travel to/from floating gate by Fowler-Nordheim tunneling

 $\checkmark$  Reversible  $\rightarrow$  erased by reversing the voltage applied during the writing process



### FLOTOX EEPROM

□ Bidirectionality of FLOTOX EEPROM: injecting electrons onto floating gate raises V<sub>T</sub>, while reverse operation lowers V<sub>T</sub>.
 □ Resulting threshold voltage depends on initial gate charge→ threshold control problem

✓ Removing too much charge from floating gate results in depletion device  $\rightarrow$  cannot be turned off when WL=0.

 $\checkmark$  Solution: add an extra transistor in series with FLOTOX as access device during read operation, while FLOTOX for

storage.



 $\label{eq:EEPROM cell during a read operation} When programmed, V_T of FLOTOX device is higher than V_DD, effectively disabling it. If not, it acts as a closed switch.$ 

### Flash EEPROM

□ EEPROM: large area (2 transistors in each cell), fabrication of very thin oxide is challenging and costly. But it has high versatility, more resilient against wear-out.

□ Flash EEPROM: combines density of EPROM and versatility of EEPROM, with cost and functionality between the two.

□ Flash EEPROM: programmed by hot-electron-injection, erasure by Fowler-Nordheim tunneling.

 $\square$  Erasure performed in bulk for complete chip $\rightarrow$  no need for extra access transistor of EEPROM.

 $\square$  Monitoring control hardware checks threshold during erasure, dynamically adjusting the erasure time  $\rightarrow$  no depletion device

### Flash EEPROM

ETOX flash cell (Intel): very thin tunneling gate oxide (10nm)
 ✓ different areas of gate oxide used for programming and erasure

- $\checkmark$  Programming: apply high voltage (12V) on gate and drain for a grounded source.
- ✓ Erasure: gate grounded and source at 12V.



# **Cross-sections of NVM cells**



Courtesy Intel

Flash

**EPROM** 

## Characteristics of State-of-the-art NVM

|                                      | 1                               |                                    |                                  |
|--------------------------------------|---------------------------------|------------------------------------|----------------------------------|
|                                      | EPROM<br>[Tomita91]             | EEPROM<br>[Terada89,<br>Pashley89] | Flash EEPROM<br>[Jinbo92]        |
| Memory size                          | 16 Mbit (0.6 μm)                | 1 Mbit (0.8<br>μm)                 | 16 Mbit (0.6 μm)                 |
| Chip size                            | 7.18 x 17.39<br>mm <sup>2</sup> | 11.8 x 7.7<br>mm <sup>2</sup>      | 6.3 x 18.5 mm <sup>2</sup>       |
| Cell size                            | $3.8 \ \mu m^2$                 | $30 \ \mu m^2$                     | $3.4 \ \mu m^2$                  |
| Access time                          | 62 nsec                         | 120 nsec                           | 58 nsec                          |
| Erasure time                         | minutes                         | N.A.                               | 4 sec                            |
| Programming<br>time/word             | 5 µsec                          | 8 msec/word,<br>4 sec /chip        | 5 µsec                           |
| Erase/Write<br>cycles<br>[Pashley89] | 100                             | 10 <sup>5</sup>                    | 10 <sup>3</sup> -10 <sup>5</sup> |

## Read-Write Memories (RAM)

### • STATIC (SRAM)

Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential

### • DYNAMIC (DRAM)

Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended

## 6-transistor CMOS SRAM Cell

- Generic SRAM cell: 6 transistors.
- ✓ Access to cell is enabled by word line
- ✓ Two bit lines (BL and !BL) are required to improve noise margins during read and write operations.



### CMOS SRAM Analysis (Write)

- □ CMOS SRAM write operation
- ✓ Assume 1 is stored in the cell (Q=1)
- ✓ A 0 is written in the cell by setting BL=0 and !BL=1.
- ✓ flip-flop changes state if devices are sized properly.



# CMOS SRAM Analysis (Read)



## 6T-SRAM — Layout

□ CMOS SRAM with 6 transistors

✓ area-hungury: besides the devices, it needs signal routing and connections to two bit lines, a word line and both supply rails.

 Total area: 1092λ<sup>2</sup> VDD Q GND W BL BL

# Resistance-load SRAM Cell Resistive load SRAM: four-transistors

- ✓ replace cross-coupled CMOS inverter pair by a pair of resistive-load NMOS inverters
- $\checkmark$  R<sub>L</sub> must be as high as possible for reasonable noise margin NML and to reduce static power consumption
- ✓ But if  $R_1$  is very large,  $t_{ol H}$  and cell size are also increased.
- ✓ Solution for large tpLH: precharge bit lines to VDD



Static power dissipation -- Want R<sub>L</sub> large Bit lines precharged to V<sub>DD</sub> to address t<sub>n</sub> problem

## Dynamic Random-Access Memory (DRAM)

- Dynamic Random-Access Memory (DRAM): based on charge storage on capacitor
- □ Needs continuous refresh operation to prevent the contents
- of memory cells from being corrupted by leakage.
- □ Refresh should occur every 1 to 4 msec.

**3-Transistor DRAM Cell 3** 3T DRAM cell: resistive load SRAM  $\rightarrow$  eliminating load resistors, remove redundancy of BL and  $!BL \rightarrow 3$  transistors. Two bit lines: BL1 (for write), BL2 (for read, get opposite value as stored data)

Uvriting cell: placing appropriate data value on BL1 and asserting write-word line (WWL). Data is retained as charge on capacitor CS once WWL is lowered. Storage transistor M2 is either on or off depending on stored value.



### 3T-DRAM — Layout Total 3T DRAM cell area: 576λ<sup>2</sup>



*3-Transistor DRAM Cell* 3T DRAM cell: When reading the cell, read-word line (RWL)=1, storage transistor M2 is on or off depending on the stored value.  $\Box$  BL2 is either clamped to V<sub>DD</sub> with a load device or precharged to either  $V_{DD}$  or  $V_{DD}$ - $V_{T}$ .

The series connection of M2 and M3 pulls BL2 low when a 1 is stored. BL2=1 in the opposite case.

□ Cell is inverting: the inverse value of the stored signal is sensed on the bit line BL2.

□ Refresh cells by reading the stored data, put its inverse on BL1 and assert WWL in consecutive order.



### **Properties of 3T-DRAM**

Properties of 3T-DRAM

✓ In contrast to SRAM cell, no constraints exist on device ratios. The choice of device sizes is solely based on performance and reliability considerations.

✓ In contrast to other DRAM cells, reading 3T-cell contents is nondestructive. That is, the data value stored in the cell is not affected by a read.

 $\checkmark$  The value stored on the storage node X when writing a 1 equals V<sub>WWI</sub> -V<sub>Tn</sub>

### 1-Transistor DRAM Cell

□ 1-Transistor DRAM cell:

✓ Write: data value is placed on bit line BL, word line
 WL=1.Depending on the data value, cell capacitance CS is either charged or discharged.



Write:  $C_S$  is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance

> $\Delta V = V_{BL} - V_{PRE} = (V_{BIT} - V_{PRE}) \frac{C_S}{C_S + C_{BL}}$ Voltage swing is small; typically around 250 mV.

## 1-Transistor DRAM Cell

□ 1-Transistor DRAM cell:

✓ Before a read operation, BL is precharged to  $V_{PRE}$ . Upon WL=1, charge redistribution occurs between bit line and strorage capacitance.

✓  $C_{BL}$ : bit-line capacitance.  $V_{BL}$ : potential of bit-line after charge redistribution.  $V_{BIT}$ : initial voltage on CS.



 $\Delta V = V_{BL} - V_{PRE} = (V_{BIT} - V_{PRE}) \frac{C_S}{C_S + C_{PRE}}$ 

Voltage swing is small; typically around 250 mV.

# DRAM Cell Observations

- **1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out.**
- DRAM memory cells are single ended in contrast to SRAM cells.
- The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation.
- Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design.
- When writing a "1" into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than  $V_{DD}$ .

## 1-T DRAM Cell



Expensive in Area

### SEM of Poly-diffusion Capacitor 1T-DRAM



### Advanced 1T DRAM Cells



### Periphery

 $\Box$  Memory core trades performance and reliability for reduced area  $\rightarrow$  needs peripheral circuitry to recover both speed and electrical integrity

- □ Memory peripheral circuitry
- ✓ Decoders
- ✓ Sense Amplifiers
- ✓ Input/Output Buffers
- ✓ Control/Timing Circuitry

### Address Decoders - Row Decoders

□ 1-out-of-2<sup>M</sup> decoder: Collection of 2<sup>M</sup> complex logic gates organized in regular and dense fashion
 □ Ex: 10-input (A<sub>0</sub> to A<sub>9</sub>) address decoder

 $\checkmark$  (N)AND decoder:

 $WL_0 = A_0 A_1 A_2 A_3 A_4 A_5 A_6 A_7 A_8 A_9$ 

 $WL_{511} = \overline{A_0}A_1A_2A_3A_4A_5A_6A_7A_8A_9$ 

✓ NOR decoder (single-stage CMOS design):

 $WL_0 = \overline{A_0 + A_1 + A_2 + A_3 + A_4 + A_5 + A_6 + A_7 + A_8 + A_9}$ 

 $WL_{511} = A_0 + \overline{A}_1 + \overline{A}_2 + \overline{A}_3 + \overline{A}_4 + \overline{A}_5 + \overline{A}_6 + \overline{A}_7 + \overline{A}_8 + \overline{A}_9$ 

□ For 10-input NOR decoder, if using pseudo-NMOS or dynamic gate for each row → totally 11×1024=11264 transistors.
 □ It can be implemented in a regular and dense fashion like ROM design

### Dynamic Decoders with ROM Array Structure

Dynamic 2-to-4 NOR decoder: word line of selected row is 1, all other word lines are 0.

□ Dynamic 2-to-4 NAND decoder: word line of selected row is 0, all other word lines are  $1. \rightarrow$  "active low" signaling, put inverting buffer between decoder and memory.



### NAND Decoder Using 2-input Pre-decoders

Decoder propagation delay: important to read/write access times

To make large decoders faster: use predecoder which decodes segments of address in a first logic layer. A second layer of logic gates produces final word-line signals.
 Reason: splitting a complex gate into 2 or more logic layers often produces faster and cheaper implementation.
 Ex: 10-input NAND decoder:

$$WL_0 = \overline{A_0 A_1 A_2 A_3 A_4 A_5 A_6 A_7 A_8 A_9}$$

 $=\overline{(A_0 + A_1)(A_2 + A_3)(A_4 + A_5)(A_6 + A_7)(A_8 + A_9)}$ 

### Dynamic Decoders with ROM Array Structure

□ NOR decoders are much faster, but consume more area and power than NAND decoders.

□ Propagation delay:

✓  $R_{on}$  of NOR decoders are in parallel → smaller <u>RC delay</u>

✓  $R_{on}$  of NAND decoders are in series → larger <u>RC delay</u>

Dever consumption: after word lines are precharged to 1

✓ NOR decoder: only 1 WL stays "1", all other WLs discharged to  $0 \rightarrow$  more power

 $\checkmark$  NAND decoder: only 1 WL discharged to 0, all other WLs stay at "1"  $\rightarrow$  less power

### NAND Decoder Using 2-input Pre-decoders

- □ NAND decoder using 2-input predecoders:
- ✓ It reduces number of transistors required
- $\checkmark$  Number of inputs to NAND gates is halved  $\rightarrow$  delay reduced by factor of 4.

✓ Load on vertical address lines is halved (only 256 connections per line)  $\rightarrow$  reduced delay, simpler design.



4-to-1 Pass-transistor Based Column Decoder

Column/block decoder: a 2<sup>K</sup>-input multiplexer (K: size of address word). Generally 2 implementations.
 CMOS pass-transistor MUX

 $\checkmark$  Advantage: speed (t<sub>pd</sub> does not add to overall memory access time), only 1 extra transistor in signal path

✓ Disadvantage: large transistor count



### Decoder for Non-Random-Access Memories

- Non-random-access memory: no need for full decoder.
  Example: in a serial-access memory, decoder degrades to M-bit shift-register with M the number of rows.
- $\checkmark$  only 1 of the bits is "1" at a time (pointer)
- ✓ Pointer moves to next position after every access operation
- ✓ C<sup>2</sup>MOS D-FF is used in shift register



Decoder for circular shift-register. The R signal resets the pointer to the 1st position

### 4-to-1 Tree Based Column Decoder

- □ Tree decoder: no predecoder is required.
- ✓ No. of devices required is drastically reduced.  $N_{tree}$ =2<sup>K</sup>+2<sup>K-1</sup>+...+4+2=2×(2<sup>K</sup>-1)
- ✓ Disadvantage: delay increases quadratically with # of sections; prohibitive for large decoders.
- $\checkmark$  Solutions: buffers, progressive sizing, combination of tree and pass transistor approaches.







### Single-to-Differential Conversion



### Latch-Based Sense Amplifier



Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point.

### **Open Bitline Architecture**



## DRAM Read Process with Dummy Cell



## DRAM Timing



## Single-Ended Cascode Amplifier



### Address Transition Detection



### **Reliability and Yield**

 Semiconductor memories trade off noise-margin for density and performance

Highly Sensitive to Noise (Crosstalk, Supply Noise)

High Density and Large Die size cause Yield Problems

Y = 100 Number" of "Good" Chips" on "Wafer Number" of "Chips" on "Wafer

 $\mathbf{Y} = \left[\frac{1 - e^{-AD}}{AD}\right]^2$ 

Increase Yield using Error Correction and Redundancy

## **Open Bit-line Architecture — Cross Coupling**



### Folded-Bitline Architecture



### Transposed-Bitline Architecture





### Programmable Logic Array

□ Programmable logic array (PLA): using two-level sum-ofproducts representation of logic function

- Regular 2-level implementation of Boolean functions:
- ✓ 1<sup>st</sup> layer of gates implements AND operations (productterms or min-terms)
- ✓ 2<sup>nd</sup> layer implements OR functions (sum-terms)



### **Pseudo-Static PLA**

 □ Left portion: AND-plane, each row is a pseudo-NMOS NOR gate to implement a product term. a+b=a+b
 □ Right portion: OR-plane, each column is a pseudo-NMOS NOR gate (hence an extra inverter is needed) to implement a sum term. a+b=inv(a+b)



### Dynamic PLA

□ Left portion: AND-plane, each row is a dynamic  $\Phi$ N NOR gate to implement a product term.  $a \cdot b = \overline{a + b}$ □ Right portion: OR-plane, each column is a dynamic  $\Phi$ N NOR gate (hence an extra inverter is needed) to implement a sum term. a + b = inv(a + b)



## Clock Signal Generation for self-timed dynamic PLA



### PLA Layout



## PLA versus ROM

Programmable Logic Array structured approach to random logic "two level logic implementation" NOR-NOR (product of sums) NAND-NAND (sum of products)

### **IDENTICAL TO ROM!**

Main difference ROM: fully populated PLA: one element per minterm

Note: Importance of PLA's has drastically reduced 1. slow

2. better software techniques (mutli-level logic synthesis)

## Semiconductor Memory Trends



Memory Size as a function of time: x 4 every three years

# Semiconductor Memory Trends



Increasing die size factor 1.5 per generation Combined with reducing cell size factor 2.6 per generation

# Semiconductor Memory Trends



Technology feature size for different SRAM generations